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Abstract 

Burkholderia  pathogenicity  relies  on  protein  viruience  factors  to  controi  and  promote  bacteri- 
ai  internaiization,  survival,  and  replication  within  eukaryotic  host  celis.  We  recently  used 
yeast  two-hybrid  (Y2H)  screening  to  identify  a  smail  set  of  novei  Burkhoideria  proteins  that 
were  shown  to  attenuate  disease  progression  in  an  aerosoi  infection  animai  model  using  the 
virulent  Burkholderia  mallei  ATCC  23344  strain.  Here,  we  performed  an  extended  analysis 
of  primariiy  nine  B.  mallei  viruience  factors  and  their  interactions  with  human  proteins  to  map 
out  how  the  bacteria  can  influence  and  alter  host  processes  and  pathways.  Specifically,  we 
employed  topological  analyses  to  assess  the  connectivity  patterns  of  targeted  host  proteins, 
identify  modules  of  pathogen-interacting  host  proteins  linked  to  processes  promoting  infec¬ 
tivity,  and  evaluate  the  effect  of  crosstalk  among  the  identified  host  protein  moduies.  Overail, 
our  analysis  showed  that  the  targeted  host  proteins  generaily  had  a  iarge  number  of  interact¬ 
ing  partners  and  interacted  with  other  host  proteins  that  were  also  targeted  by  B.  mallei  pro¬ 
teins.  We  aiso  introduced  a  novei  Host-Pathogen  Interaction  Alignment  (HPIA)  algorithm 
and  used  it  to  explore  similarities  between  host-pathogen  interactions  of  B.  mallei,  Yersinia 
pestis,  and  Salmonella  enterica.  We  inferred  putative  roles  of  B.  mallei  proteins  based  on  the 
roles  of  their  aiigned  Y.  pestis  and  S.  enterica  partners  and  showed  that  up  to  73%  of  the  pre¬ 
dicted  roles  matched  existing  annotations.  A  key  insight  into  Burkholderia  pathogenicity  de¬ 
rived  from  these  analyses  of  Y2H  host-pathogen  interactions  is  the  identification  of 
eukaryotic-specific  targeted  cellular  mechanisms,  including  the  ubiquitination  degradation 
system  and  the  use  of  the  focal  adhesion  pathway  as  a  fulcrum  for  transmitting  mechanical 
forces  and  regulatory  signals.  This  provides  the  mechanisms  to  moduiate  and  adapt  the 
host-ceii  environment  for  the  successful  establishment  of  host  infections  and 
intraceiiuiar  spread. 
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Author  Summary 


Burkholderia  species  need  to  manipulate  many  host  processes  and  pathways  in  order  to  es¬ 
tablish  a  successful  intracellular  infection  in  eukaryotic  host  organisms.  Burkholderia  mal¬ 
lei  uses  secreted  virulence  factor  proteins  as  a  means  to  execute  host-pathogen  interactions 
and  promote  pathogenesis.  While  validated  virulence  factor  proteins  have  been  shown  to 
attenuate  infection  in  animal  models,  their  actual  roles  in  modifying  and  influencing  host 
processes  are  not  well  understood.  Here,  we  used  host-pathogen  protein-protein  interac¬ 
tions  derived  from  yeast  two-hybrid  screens  to  study  nine  known  B.  mallei  virulence  fac¬ 
tors  and  map  out  potential  virulence  mechanisms.  From  the  data,  we  derived  both  general 
and  specific  insights  into  Burkholderia  host-pathogen  infectivity  pathways.  We  showed 
that  B.  mallei  virulence  factors  tended  to  target  multifunctional  host  proteins,  proteins 
that  interacted  with  each  other,  and  host  proteins  with  a  large  number  of  interacting  part¬ 
ners.  We  also  identified  similarities  between  host-pathogen  interactions  of  B.  mallei.  Yersi¬ 
nia  pestis,  and  Salmonella  enterica  using  a  novel  host-pathogen  interactions  alignment 
algorithm.  Importantly,  our  data  are  compatible  with  a  framework  in  which  multiple 
B.  mallei  virulence  factors  broadly  influence  key  host  processes  related  to  ubiquitin-medi- 
ated  proteolysis  and  focal  adhesion.  This  provides  B.  mallei  the  means  to  modulate  and 
adapt  the  host-cell  environment  to  advance  infection. 


Introduction 

Burkholderia  mallei  is  the  causative  agent  of  glanders,  a  highly  contagious  disease  that  pri¬ 
marily  affects  horses,  mules,  and  donkeys,  but  is  also  transmittable  to  other  mammals 
through  direct  contact  with  infected  animals  [1].  This  host-adapted  bacterium  is  equipped 
with  an  extensive  set  of  mechanisms  for  invasion  and  modulation  of  eukaryotic  host-cell  envi¬ 
ronments.  Key  mechanisms  of  B.  mallei  pathogenicity  are  encoded  in  virulence  factors  (pro¬ 
teins  required  for  virulence)  that  control  and  promote  pathogenic  internalization,  survival, 
and  replication  within  host  cells  [2,3].  While  a  number  of  B.  mallei  proteins  associated  with 
pathogenicity  have  been  characterized  and  mapped  to  adhesion,  endosomal  escape  and  eva¬ 
sion  of  host-cell  autophagy,  actin-based  motility,  multi-nucleated  giant  cell  formation,  repli¬ 
cation,  and  cell-to-cell  spread  [3-7],  the  identities  of  their  host  targets  are  largely  unknown, 
and  the  underlying  mechanisms  by  which  the  bacterial  proteins  affect  these  processes  are 
poorly  understood. 

In  our  previous  study  [8],  we  used  a  combined  computational  and  experimental  strategy  to 
systematically  identify  and  characterize  the  interactions  between  B.  mallei  virulence  factors  and 
their  host  targets.  We  employed  several  bioinformatics  approaches  to  identify  and  select  a 
small  number  of  putative  and  known  virulence  factors,  and  used  yeast  two-hybrid  (Y2H)  assays 
to  identify  their  interacting  protein  partners  in  human  and  murine  hosts.  The  analysis  of  these 
host-B.  mallei  protein-protein  interactions  (PPIs)  allowed  us  to  identify  three  novel  B.  mallei 
ATCC  23344  virulence  factors  and  show  that  they  attenuated  B.  mallei  virulence  in  mouse 
aerosol  challenge  experiments.  Although  our  PPI  data  contained  extensive  interactions  be¬ 
tween  multiple  host  proteins  and  B.  mallei  proteins,  we  did  not  fully  explore  these  data  to  more 
generally  characterize  B.  mallei  virulence  mechanisms.  Here,  we  performed  a  systematic  analy¬ 
sis  of  these  interactions  to  investigate  the  mechanisms  by  which  B.  mallei  virulence  factors  in¬ 
teract  with  host  proteins  to  establish  infection,  evade  host  immune  responses,  and  spread 
within  the  host.  We  evaluated  whether  the  virulence  factors  target  specific  (non-random)  host 
proteins  and  processes  and  whether  they  jointly  affect  entry  into  and  survival  within  the  host 
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cells.  Furthermore,  we  evaluated  whether  we  could  detect  commonalities  in  Gram-negative 
bacterial  host-pathogen  interactions  among  B.  mallei,  Yersinia  pestis,  and  Salmonella  enterica 
virulence  factors. 

A  number  of  studies  have  used  small-  or  large-scale  experiments  to  analyze  Gram-negative 
bacteria  and  their  host  interactions  [9-13].  Although  the  identified  interactions  represent 
only  a  fraction  of  all  possible  interactions  between  host  and  pathogen  proteins  (ranging  from 
less  than  10  interactions  to  a  few  thousand  PPIs),  they  have  proved  to  be  a  valuable  source  of 
information  about  bacterial  pathogenicity  mechanisms.  Analyses  of  these  host-pathogen  PPI 
datasets  showed  that  virulence-associated  pathogen  proteins  preferentially  target  host  pro¬ 
teins  involved  in  biological  processes  essential  for  cell  vitality,  e.g.,  signaling,  cell  cycle,  or  im¬ 
mune  response  [9-13].  Additionally,  other  studies  demonstrated  that  similarities  in  host- 
pathogen  PPIs  can  be  used  to  predict  novel  host  proteins  that  are  targeted  by  bacterial  pro¬ 
teins  [14-16]. 

Our  analysis  showed  that  B.  mallei  virulence  factors  targeted  host  proteins  that  had  a  large 
number  of  interacting  partners  and  were  closely  connected  to  each  other.  In  addition,  the 
analysis  revealed  specific  host  processes  relevant  to  B.  mallei  virulence  factors’  pathogenicity, 
e.g.,  signaling  and  communication,  protein  modification  and  regulation,  and  cytoskeleton  or¬ 
ganization,  and  suggested  that  virulence  factors  preferentially  targeted  multifunctional  host 
proteins,  thereby  affecting  multiple  host  cellular  processes  simultaneously.  When  we  used  all 
of  our  interaction  data,  including  host  interactions  with  putative  but  not  validated  B.  mallei 
virulence  factors,  we  identified  additional  host  processes  and  molecular  pathways  that  were 
previously  experimentally  associated  with  B.  mallei  pathogenicity  [2,  17-21].  Moreover,  our 
evaluation  of  the  relationship  between  targeted  host  proteins  involved  in  different  processes 
and  pathways  supported  a  previously  observed  mechanism  for  bacterial  interference  with  eu¬ 
karyotic  hosts:  virulence  factors  can  focus  interference  by  targeting  key  host  proteins  whose 
effect  can  propagate  through  and  influence  multiple  host  processes  and  pathways  [2, 17, 18]. 
Additionally,  we  introduced  a  novel  Host-Pathogen  Interaction  Alignment  (HPIA)  algorithm 
and  used  it  to  explore  similarities  between  host-pathogen  interactions  of  B.  mallei,  Y.  pestis 
[13],  and  Salmonella  enterica  subsp.  enterica  serovar  Typhimurium  [12].  Using  the  HPIA  al¬ 
gorithm,  we  identified  a  statistically  significant  number  of  functionally  similar  host-pathogen 
interactions  between  these  three  PPI  datasets.  We  inferred  putative  roles  for  B.  mallei 
proteins  based  on  the  role  of  their  aligned  Y.  pestis  and  S.  enterica  partners  and  showed  that 
up  to  73%  of  the  putatively  annotated  B.  mallei  protein  roles  matched  their  existing 
annotations. 

Our  findings  show  that  host-pathogen  interactions  represent  a  rich  source  of  information 
about  molecular  mechanisms  of  pathogenicity.  A  key  insight  from  these  analyses  into  Burkhol¬ 
deria  pathogenicity  is  the  concerted  targeting  of  the  ubiquitination  degradation  system  and  use 
of  the  focal  adhesion  pathway  as  a  fulcrum  for  signaling  and  changing  cell  morphology.  These 
mechanisms  provide  B.  mallei  with  the  ability  to  modulate  and  adapt  the  host-cell  environment 
to  establish  intracellular  host  infections. 


Results/Discussion 

We  created  an  inclusive  set  of  human-B.  mallei  PPIs  by  merging  human-B.  mallei  and  ortholo- 
gous  murine-B.  mallei  protein  interaction  data  identified  in  our  previous  Y2H  screens  [8].  The 
resulting  dataset  consisted  of  1,235  unique  interactions  between  21  B.  mallei  and  828  human 
proteins.  Fig.  1  shows  these  interactions  and  their  Y2H-library  origins.  It  also  shows  that  the 
majority  of  the  B.  mallei  proteins  interacted  with  unique  host  proteins,  i.e.,  615  (74%)  of  host 
proteins  interacted  with  a  single  B.  mallei  protein.  Importantly,  the  bulk  of  the  host-B.  mallei 
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9  Human  library 

O  Murine  library 

9  Human  &  murine  libraries 

Fig  1 .  B.  ma//e/-human  host-pathogen  protein  interactions.  The  set  of  human-S.  mallei  protein-protein 
interactions  (PPIs)  was  created  by  merging  human-B.  mallei  and  orthologous  murine-S.  mallei  interaction 
data.  The  set  consists  of  1 ,235  unique  interactions  (gray  and  purple  lines)  between  21  6.  mallei  (green 
hexagons)  and  828  human  proteins  (pink,  blue,  and  purple  circles).  Known  virulence  factors  are  also 
indicated  in  the  graph. 

doi:  1 0. 1 371  /journal  .pcbi.  1 004088. gOOl 
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interactions  (72%  or  890  interactions)  involved  nine  known  B.  mallei  virulence  factors:  PilA, 
BimA,  BopA,  BipD,  BipB,  BsaU,  BMAA1865,  TssN,  and  BMAA0553  (Table  1)  [22].  These 
nine  B.  mallei  virulence  factors  interacted  with  663  human  proteins  (80%  of  all  identified  host 
proteins),  implying  that  the  captured  data  were  largely  reflective  of  host-pathogen  virulence 
mechanisms.  We  start  our  analysis  by  assessing  the  characteristics  of  the  host  proteins  targeted 
by  these  virulence  factors. 
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Table  1 .  Known  B.  mallei  virulence  factors  that  have  been  shown  to  attenuate  the  disease  in  animai  modeis. 

Protein 

Secretion 

System 

Description 

Role  in  Virulence 

Animal  model 

Ref. 

BMA0278  (PilA) 

2 

Type  IV  pilin 

Bacterial  adhesion 

BALB/c 

[62] 

BMAA0749  (BimA)* 

5 

Trimeric  autotransporter  adhesin 

Actin-based  motility  and  intracellular/ 
intercellular  spread 

Murine 

[4] 

BMAA1521  (BopA)* 

3 

Secreted  effector/translocon 

Intracellular  survival 

BALB/c 

[63] 

BMAA1528  (BipD)* 

3 

Transcriptional  regulation;  Needle  tip 
protein 

Invasion  of  non-phagocytic  cells 

BALB/c, 

C57BLy6 

[64] 

BMAA1531  (BipB)* 

3 

Secreted  effector/translocon 

Multi-nucleated  giant  cell  formation 

BALB/c 

[65] 

BMAA1538  (BsaU)* 

3 

Secretion  apparatus;  Needle-length 
control 

Bacterial  escape  from  endocytic  vesicles 

BALB/c 

[66] 

BMAA1865 

3 

N/A 

Modulation  of  host  ubiquitination;  Phagosome 
escape 

BALB/c 

[8] 

BMAA0728  (TssN) 

6 

N/A 

Interference  with  host  actin  cytoskeleton 
rearrangement 

BALB/c 

[8] 

BMAA0553 

2 

Ser/Thr  protein  phosphatase 

Interference  with  host  actin  cytoskeleton 
rearrangement 

BALB/c 

[8] 

*Proteins  that  have  been  experimentally  linked  to  a  particular  secretion  system. 

doi:10.1371/journal.pcbi.1004088.t001 


Characteristics  of  host  proteins  interacting  with  known  B.  mallei 
virulence  factors 

B.  mallei  virulence  factors  are  associated  with  multiple  pathogenic  mechanisms  of  action 
(Table  1)  [3-7],  but  their  direct  molecular  interactions  are  not  well  delineated.  First,  we  applied 
functional  enrichment  analyses  based  on  Gene  Ontology  (GO)  annotation  data  [23]  to  assess 
the  characteristics  of  the  human  proteins  targeted  by  the  nine  virulence  factors.  Table  2  shows 
that  these  virulence  factors  interacted  with  a  statistically  significant  number  of  human  proteins 
that  were  associated  with  1 )  protein  ubiquitination  and  ubiquitin  ligase  activity,  2)  vesicle  orga¬ 
nization,  and  3)  protein  complexes  located  in  the  cytoskeleton,  in  lysosomes,  and  in  the  nuclear 
lumen.  These  results  were  consistent  with  the  experimentally  observed  pathogen  interference 
with  host  cytoskeleton  organization  and  ubiquitination  levels  [2,  3, 19-21,  24]. 

Next,  we  examined  the  gross  topological  properties  of  the  network  of  interactions  formed 
by  the  B.  ma/Zei-targeted  host  proteins  and  their  interacting  host  partners,  regardless  of  wheth¬ 
er  these  proteins  did  or  did  not  interact  with  the  B.  mallei  proteins.  We  mapped  the  identified 
host  proteins  interacting  with  B.  mallei  onto  a  human  PPI  network  [25]  consisting  of  76,043 
physical  PPIs  among  1 1,688  proteins.  Of  the  663  human  proteins  interacting  with  the  nine  B. 
mallei  virulence  factors,  approximately  75%  (498)  were  present  in  our  human  PPI  network. 
This  set  contained  proteins  that  had,  on  average,  a  significantly  larger  number  of  interacting 
partners  per  protein  (19.5  vs.  13.0)  than  would  be  expected  from  a  corresponding  random  se¬ 
lection  of  proteins  from  the  entire  human  PPI  network  (Table  3).  Among  the  highest-interact¬ 
ing  host  proteins  targeted  by  the  virulence  factors,  we  found  the  adapter  protein  YWHAG  (14- 
3-3  protein  gamma)  with  376  interactions.  This  protein,  an  interacting  partner  of  BimA,  has 
been  implicated  in  the  regulation  of  a  large  spectrum  of  signaling  pathways  [26].  Further  topo¬ 
logical  measures  associated  with  the  set  of  498  proteins,  such  as  their  clustering  coefficient 
(a  measure  of  interactions  among  nearest  neighbors),  were  not  different  from  the  random  se¬ 
lection  (Table  3).  We  observed  small  effects  on  the  length  of  the  shortest  path  between  any  two 
proteins  in  the  set,  but  it  was  unclear  how  to  associate  these  topological  parameters  with 
B.  mallei  virulence. 
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Table  2.  Enrichment  of  Gene  Ontology  (GO)  terms  for  human  proteins  interacting  with  B.  mallei  virulence  factors. 


Type 

Term 

Number  of  proteins 

p-vaiue 

ID 

Description 

Original 

FDR 

GO  Biological  Processes 

00:0043161 

Proteasomal  ubiquitin-dependent  protein  catabolic  process 

21 

4.8- 10'® 

0.00 

00:0006457 

Protein  folding 

18 

1.2-10''‘ 

0.03 

00:0000209 

Protein  polyubiquitination 

14 

2.0- 10"'^ 

0.04 

00:0016050 

Vesicle  organization 

11 

2.5- lO-'^ 

0.05 

GO  Molecular  Functions 

00:0004842 

Ubiquitin-protein  ligase  activity 

18 

3.1  ■lO-'^ 

0.03 

00:0003729 

mRNA  binding 

10 

5.1  ■lO-'^ 

0.05 

00:0031072 

Heat  shock  protein  binding 

10 

6.3- lO-'^ 

0.05 

GO  Cellular  Localizations 

00:0000151 

Ubiquitin  ligase  complex 

13 

3.7- lO-'^ 

0.01 

00:0015629 

Actin  cytoskeleton 

24 

4.8- lO-'^ 

0.01 

00:0030529 

Ribonucleoprotein  complex 

30 

7.8- lO-'^ 

0.02 

00:0043220 

Schmidt-Lanterman  incisure 

3 

8.5- lO-'^ 

0.02 

00:0034663 

Endoplasmic  reticulum  chaperone  complex 

2 

1.3- 10'® 

0.03 

00:0030135 

Coated  vesicle 

19 

1.6- 10'® 

0.03 

00:0031371 

Ubiquitin  conjugating  enzyme  complex 

3 

2.3- 10® 

0.04 

00:0005764 

Lysosome 

19 

2.5- 10'® 

0.04 

00:0032838 

Cell  projection  cytoplasm 

4 

2.7- 10® 

0.04 

00:0030131 

Clathrin  adaptor  complex 

5 

3.4- 10'® 

0.05 

00:0000803 

Sex  chromosome 

4 

3.4- 10'® 

0.05 

00:0070971 

Endoplasmic  reticulum  exit  site 

2 

3.8- 10'® 

0.05 

00:0031981 

Nuclear  lumen 

80 

4.3- 10'® 

0.05 

00:0030128 

Clathrin  coat  of  endocytic  vesicle 

3 

4.6- 10'® 

0.05 

FDR:  False  discovery  rate  calculated  using  Benjamin!  and  Hochberg  multiple  test  correction  [45]. 


doi:10.1371/journal.pcbi.1004088.t002 


Table  3.  Topological  properties  of  human  proteins  interacting  with  B.  mallei.  We  evaluated  the  following  properties  of  the  host  proteins  that 
interacted  with  B.  mallei  proteins  based  on  the  human  protein-protein  interaction  (PPI)  network  [25]:  the  number  of  these  host  proteins  in  the  human  PPI 
network  (A/p);  the  average  number  of  interacting  partners  (in  the  human  PPI  network)  of  each  host  protein  (D);  the  clustering  coefficient,  i.e.,  the  number  of 
interactions  among  the  nearest  neighbors  (C);  the  average  shortest  path  between  any  two  proteins  in  the  set  (SP);  the  average  number  of  interacting 
partners  in  the  human  PPI  network  where  both  partners  interact  with  B.  mallei  proteins  (D,);  and  the  number  of  host  proteins  in  the  largest  connected 
component  (/V^“).  The  top  three  rows  show  the  results  for  the  host  proteins  present  in  the  PPI  that  interacted  with  the  nine  known  virulence  factors, 
whereas  the  three  lower  rows  correspond  to  host  proteins  that  interacted  with  all  21  tested  B.  mallei  proteins  from  the  yeast  two-hybrid  screening  (known 
and  putative  virulence  factors).  The  results  for  the  randomly  selected  (498  or  61 9)  human  proteins  from  the  entire  human  PPI  network  (All  PPIs)  were 
generated  through  1 0^  random  repetitions  to  create  averages  and  standard  deviations.  The  indicated  p-values  correspond  to  the  probability  of  the 
observed  properties  being  different  from  the  randomly  selected  set  from  all  PPIs. 


Wp 

D  (SD) 

C(SD) 

SP  (SD) 

D,(SD) 

(SD) 

Known  virulence  factors 

PPIs 

498 

19.5 

0.15 

3.41 

0.65 

202 

All  PPIs 

498 

13.0  (1) 

0.17(0.01) 

3.70  (0.04) 

0.28  (0.05) 

80  (29) 

p-value 

- 

8.8- 10® 

0.33 

2.0- 10'" 

1.6-10''‘^ 

3.3-1  O'® 

Known  and  putative  virulence  factors 

PPIs 

619 

19.5 

0.15 

3.40 

0.85 

284 

All  PPIs 

619 

13.0  (1) 

0.17(0.01) 

3.70  (0.04) 

0.35  (0.06) 

136  (35) 

p-value 

- 

4.3-10'’® 

0.27 

I.O-IO'’"* 

I.O-IO'®'^ 

2.2-10'® 

SD:  standard  deviation. 

doi:1 0.1 371  /journal. pcbi.1 004088.t003 
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Next,  we  examined  human  protein  interactions  where  both  proteins  individually  interacted 
with  one  or  more  B.  mallei  proteins.  For  the  498-protein  set,  we  found  202  unique  proteins 
that  participated  in  325  human  protein-protein  interactions.  In  comparison  with  randomly  se¬ 
lected  proteins,  the  B.  ma/Zei-targeted  proteins  were  engaged  in  a  significantly  larger  number  of 
these  interactions  (0.65  vs.  0.28  on  a  per-protein  basis).  A  further  examination  of  sets  of  con¬ 
nected  human  proteins  that  also  interacted  with  the  virulence  factors,  revealed  the  presence  of 
a  single,  large  connected  component,  i.e.,  a  sub-network  in  which  a  path  connects  any  two  pro¬ 
teins  to  each  other.  This  largest  connected  component  was  composed  of  202  proteins  and  con¬ 
tained  the  majority  (95%)  of  the  325  interactions  between  the  human  proteins  interacting  with 
B.  mallei  (Table  3).  The  other  1 1  connected  components  consisted  of  five  or  fewer  proteins,  an 
observation  that  was  not  statistically  significant  from  a  random  selection  of  proteins  (data  not 
shown).  We  found  a  five-fold  increase,  from  0.28  to  1.53  (0.95*325/202),  in  the  number  of 
human  PPls  for  each  protein  in  the  largest  connected  component  that  were  all  targeted  by  the 
virulence  factors,  compared  to  a  random  selection  of  proteins.  These  results  suggest  that  a 
property  of  the  B.  mallei  virulence  phenotype  is  to  target  well-connected  host  proteins  in  a 
unique  set  of  interconnected  host  proteins.  Next,  we  used  this  property  to  expand  on  our  initial 
set  of  GO  annotations  to  better  characterize  B.  mallei  infectivity  and  pathogenesis. 


B.  mallei  virulence  factors  target  interactions  among  host  proteins 

The  analysis  of  the  interactions  between  the  virulence  factors  and  host  proteins  showed  that 
the  targeted  human  proteins  were  highly  likely  to  interact  among  themselves.  We  hypothesized 
that  interactions  among  these  host  proteins  are  equally  important  targets  as  the  proteins  them¬ 
selves  and  could  be  used  to  shed  light  on  how  virulence  factors  exert  their  influence.  As  detailed 
in  Materials  and  Methods,  we  used  the  largest  connected  component  identified  above  to  detect 
93  sets  of  human  PPls  in  which,  in  each  set,  aU  human  proteins  interacted  with  at  least  one  of 
the  nine  known  B.  mallei  virulence  factors  and  had  the  same  GO  biological  process  annota¬ 
tions;  we  denoted  these  sets  interaction  modules. 

Table  4  shows  that  these  interaction  modules  were  associated  with  biological  processes  related 
to  ligase  activity,  ubiquitination,  protein  modification,  transcription  and  translation,  immune  re¬ 
sponse,  signaling,  cytoskeleton  organization,  development,  and  mRNA  processing.  Overall,  the 
identified  biological  processes  were  similar  to  the  ones  identified  when  interactions  among  host 
proteins  were  not  taken  into  account;  however,  they  provided  an  improved  annotation  granular¬ 
ity.  For  example,  the  interaction  modules  allowed  us  to  identify  a  biological  process  termed  “pos¬ 
itive  regulation  of  protein  ubiquitination”  instead  of  just  “protein  ubiquitination.”  Importantly, 
the  analysis  provided  evidence  of  a  much  larger  effort  to  target  intracellular  host  signahng  pro¬ 
cesses,  in  particular  those  related  to  the  immune  response.  Fig.  2  shows  the  subset  of  1 16  proteins 
and  163  interactions  from  the  largest  connected  component  that  were  part  of  the  93  identified 
interaction  modules  and  the  location  of  six  interaction  modules.  Each  of  the  interaction  modules 
constituting  ubiquitination  and  ligase  activity,  transcriptional  regulation,  immune  response,  cy¬ 
toskeleton  organization,  and  mRNA  processing,  consisted  of  proteins  and  interactions  that  were 
closely  grouped  together  in  the  largest  connected  component  (Fig.  2A-E).  Fig.  2  also  shows  that 
some  human  proteins  are  a  part  of  multiple  interaction  modules,  suggesting  that  B.  mallei  inter¬ 
acts  with  multifunctional  or  “moonlighting”  host  proteins  [27].  Multifunctional  proteins  have 
been  associated  with  such  neurological  disorders  as  Alzheimer’s  and  Parkinson’s  diseases  [28], 
as  well  as  with  bacterial  virulence  in  Helicobacter  pylori,  Mycobacterium  tuberculosis,  and 
Streptococcus  pneumonia  [29].  Given  the  multifaceted  role  of  these  proteins  in  enzymatic  cataly¬ 
sis,  signal  transduction,  transcriptional  regulation,  apoptosis,  motility,  and  growth  [30,  31], 
interactions  with  them  suggest  an  avenue  for  B.  mallei  to  simultaneously  interfere  with  multiple 
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Table  4.  Enrichment  of  Gene  Ontology  (GO)  biological  processes  in  host  subnetworks.  LCC  represents  the  number  of  proteins  in  the  largest 
connected  component  annotated  with  a  given  term;  LIM  represents  the  number  of  proteins  in  the  largest  interaction  module  for  a  given  term;  poo  denotes 
the  probability  of  the  same  number  of  proteins  as  the  LCC  being  annotated  with  a  given  GO  term  solely  through  a  random  selection;  ppp  denotes  the 
probability  that  a  given  number  of  proteins  as  the  LIM  are  annotated  with  a  given  GO  term  solely  through  random  selection;  pRn  represents  the  probability 
that  a  given  number  of  proteins  as  the  LIM  are  annotated  with  a  given  GO  term  solely  through  random  selection  in  a  random  network  that  has  the  same 
degree  distribution  as  our  human  network.  All  p-values  were  assessed  using  the  Benjamini-Hochberg  method  to  meet  a  maximum  false  discovery  rate 
threshold  of  5%  [45].  The  table  contains  only  the  lowest-level  GO  terms;  the  complete  data  are  available  in  S1  Table. 


Category 

Term 

Size 

p-value 

ID 

Description 

LCC 

LIM 

Pgo 

Prp 

PRn 

Protein  modification 

00:0051443 

Positive  regulation  of  ubiquitin-protein  ligase  activity 

5 

3 

3.6- 10'^ 

8.2- 10'® 

4.5-1  O'® 

00:0051351 

Positive  regulation  of  ligase  activity 

5 

3 

4.1 -lO'^ 

8.2- 10'® 

4.5-10'® 

00:0000209 

Protein  polyubiquitination 

10 

8 

0.0 

0.0 

0.0 

00:0016574 

Histone  ubiquitination 

3 

3 

5.8- 10'^ 

6.0- lO-'^ 

3.7-10'^ 

00:0051248 

Negative  regulation  of  protein  metabolic  process 

15 

10 

7.0- lO'^^ 

i.o-io-'^ 

2.0-10'^ 

Cell  cycle 

00:0051320 

S  phase 

6 

3 

1.0- 10'^ 

1.0- 10'^ 

1.7-10'^ 

00:0051437 

Positive  regulation  of  ubiquitin-protein  ligase  activity  involved  in 
mitotic  cell  cycle 

4 

3 

1.0- 10'^ 

7.4- 10'^ 

2.4-10® 

mRNA  processing 

00:0051028 

mRNA  transport 

5 

3 

9.5- 10'^ 

3.2- 10'® 

6.4-10® 

00:0000398 

mRNA  splicing,  via  spliceosome 

8 

6 

7.3- 10'^ 

i.o-io-'^ 

1.0-10'® 

Transcription  and 

00:0006355 

Regulation  of  transcription,  DNA-dependent 

45 

15 

7.4- 10'^ 

6.0- lO-'^ 

4.3-10'^ 

translation 

00:0006413 

Translational  initiation 

8 

5 

9.0- lO'^^ 

0.0 

0.0 

Signaling  and  immune 

00:0007154 

Cell  communication 

71 

44 

5.7- 10'^ 

0.0 

4.4-10'® 

response 

00:0023052 

Signaling 

71 

44 

2.7- 10'^ 

0.0 

4.4-10'® 

00:0035556 

Intracellular  signal  transduction 

34 

17 

7.3- 10'^ 

0.0 

1.1-10'^ 

00:0007167 

Enzyme  linked  receptor  protein  signaling  pathway 

21 

8 

1.6- 10'^ 

1.6- 10'® 

6.8-10-^ 

00:0050852 

T  cell  receptor  signaling  pathway 

7 

4 

4.0- lO'^^ 

1.1-10'® 

3.3-10'^ 

00:0016032 

Viral  reproduction 

13 

4 

2.0- 10'^ 

5.0- 10'® 

2.1-10'^ 

00:0050688 

Regulation  of  defense  response  to  virus 

5 

3 

1.8- 10'^ 

7.0- lO-'^ 

2.0- 10'"* 

00:0019221 

Cytokine-mediated  signaling  pathway 

12 

3 

1.8- 10'^ 

1.0- 10'^ 

6.3-10-^ 

Development 

00:0048731 

System  development 

52 

28 

1.0- 10'^ 

0.0 

6.1-10'^ 

00:0048812 

Neuron  projection  morphogenesis 

14 

5 

1.0- 10'^ 

1.4-10''‘ 

4.7-10-^ 

Other 

00:0016311 

Dephosphorylation 

9 

4 

1.6- 10'^ 

0.0 

2.0-10® 

00:0006457 

Protein  folding 

8 

3 

3.6- 10'^ 

3.7- 10'® 

1.6-10® 

00:0016192 

Vesicle-mediated  transport 

20 

5 

1.0- 10'^ 

3.8- 10=^ 

3.0-10'® 

00:0007010 

Cytoskeleton  organization 

20 

9 

3.1 -lO'^ 

0.0 

4.8-10'® 

doi:10.1371/journal.pcbi.1004088.t004 


host-cellular  processes  to  facilitate  invasion  and  survival.  In  particular,  Fig.  2F  shows  the 
largest  interaction  module  associated  with  biological  processes  linked  to  multifunctional  pro¬ 
teins.  This  interaction  module  contained  54  interactions  among  44  human  proteins  associated 
with  various  types  of  regulation  (regulation  of  gene  expression,  cytokinesis,  or  apoptosis),  signal 
transduction  (GTPase  mediated  signal  transduction  and  Janus  kinase/signal  transduction),  and 
response  triggering  (immune  response  and  response  to  stress).  Additionally,  this  module  con¬ 
tained  host-interacting  partners  of  eight  out  of  the  nine  B.  mallei  virulence  factors  from  our  set, 
lacking  only  Bop  A.  These  results  suggest  that  B.  mallei  virulence  factors  target  multifunctional 
host  proteins  to  simultaneously  interfere  with  multiple  host  processes  required  for  normal 
cellular  function. 
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llbiquitination  and  ligase  activity 


Transcriptional  regulation 


Immune  response 


C'>  toskeleton  organization,  mRNA  processing  Cell  communication  and 

organelle  organization,  and  signaling 

cell  morphogenesis 

Fig  2.  Clustering  of  human  proteins  targeted  by  B.  mallei  known  viruience  factors.  The  graphic  shows  1 1 6  proteins  of  the  largest  connected 
component  of  the  human  PPI  network  that  belong  to  one  or  more  statistically  significant  interaction  modules.  Note  that  each  of  these  human  proteins  also 
interacted  with  one  or  more  known  B.  mallei  virulence  factors.  As  exemplified  by  the  annotated  interaction  modules  in  A-F,  the  known  virulence  factors 
targeted  human  proteins  that  were  highly  interacting  among  themselves  and  belonged  to  the  same  biological  process.  For  a  list  of  host  proteins  that  compose 
each  interaction  module,  see  S2  Table. 

doi:10.1371/journal.pcbi.1004088.g002 


Putative  B.  mallei  virulence  factors  improve  characterization  of  B.  mallei 
targets 

Given  that  our  host-B.  mallei  interaction  dataset  contained  a  number  of  putative  virulence  fac¬ 
tors,  we  also  evaluated  the  effect  of  adding  these  virulence  factors  into  our  analysis  to  character¬ 
ize  host  targets.  Similarly  to  the  above  analyses,  we  first  evaluated  the  prevalent  characteristics 
of  human  proteins  using  GO  annotation  [23].  The  identified  molecular  annotations  largely 
matched  those  identified  for  known  virulence  factors  only,  but  also  included  additional  GO 
terms,  such  as  terms  related  to  RNA  metabolic  processes  (S3  Table).  Table  3  shows  that  the 
analysis  of  topological  properties  of  host  proteins  interacting  with  known  and  putative  viru¬ 
lence  factors  displayed  the  same  trends  observed  in  the  analysis  of  the  interacting  partners  of 
known  virulence  factors.  Next,  we  evaluated  the  extent  to  which  known  and  putative  virulence 
factors  also  targeted  connected  subsets  of  host  PPIs.  We  identified  75  statistically  significant  in¬ 
teraction  modules  whose  GO  biological  process  annotations  largely  overlapped  with  the  ones 
identified  for  interacting  partners  of  known  virulence  factors  only.  Although  the  number  of  sta¬ 
tistically  significant  interaction  modules  was  smaller  than  above  (an  increase  in  the  number  of 
host  proteins  dilutes  the  enrichment),  the  addition  of  new  host  proteins  increased  the  size  (in 
terms  of  proteins  and  interactions)  of  previously  identified  interaction  modules  (S4  Table). 

This  suggests  that  with  the  increase  of  protein  annotation  or  with  the  identification  of  addition¬ 
al  host-B.  mallei  PPIs,  we  will  be  able  to  identify  larger  and  more  complete  host  interaction 
modules  targeted  by  B.  mallei  virulence  factors. 
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Consequently,  we  used  all  the  interactions  shown  in  Fig.  1  to  identify  biological  pathways 
targeted  by  all  tested  B.  mallei  proteins  using  the  Kyoto  Encyclopedia  of  Genes  and  Genomes 
(KEGG)  annotation  database  [32].  We  identified  two  statistically  significantly  enriched  host 
pathways:  bacterial  invasion  of  epithelial  cells  and  focal  adhesion.  Eig.  3  shows  that  the  proteins 
targeted  in  the  focal  adhesion  pathway  appeared  to  be  coordinated  for  pathway  activation  and 
largely  interacted  with  each  other  (yellow  boxes).  The  majority  of  these  molecular  interactions 
belonged  to  a  connected  sub-pathway  located  at  the  beginning  of  the  pathway  (the  probability 
of  observing  such  connectivity  at  random  is  <  10"'’),  and  they  provided  a  link  between  mem¬ 
brane  receptors  and  signaling  events  that  led  to  reorganization  of  the  actin  cytoskeleton. 


Human-S.  mallei  interactions  and  their  effect  on  the  crosstalk  between 
different  biological  processes 

One  of  the  most  prominently  recurring  results  across  all  of  our  analyses  was  the  link  between 
B.  mallei  pathogenicity  and  host  cytoskeleton  organization.  It  has  been  shown  that  a  number  of 
bacterial  pathogens,  including  Yersinia,  Salmonella,  Shigella,  Listeria,  and  Burkholderia,  inter¬ 
fere  with  host  signaling  pathways  to  stimulate  the  host’s  cytoskeleton  rearrangement  [2,  33]. 
These  changes  in  signaling  lead  to  changes  in  the  host-cell  shape  and  facilitate  bacterial  inter¬ 
nalization  and  ceU-to-cell  spread  [33].  Fig.  4  shows  host  proteins  that  interacted  with  known 
and  putative  B.  mallei  virulence  factors  that  can  be  directly  associated  with  cytoskeleton  organi¬ 
zation.  The  largest  statistically  significant  interaction  module,  represented  by  red  stars. 


FOCAL. ADHESION 


Fig  3.  Focal  adhesion  pathway  as  a  viruience  factor  target.  We  identified  the  Kyoto  Encyclopedia  of  Genes  and  Genomes  (KEGG)  focal  adhesion 
pathway  as  enriched  with  multiple  virulence  factor  targets.  The  majority  (1 7  of  20)  of  host  proteins  interacting  with  B.  mallei  virulence  factors  in  this  pathway 
belong  to  a  connected  sub-pathway  of  proteins  (yellow  boxes  and  red  lines),  mainly  grouped  at  the  beginning  of  the  pathway.  This  observation  implied  that 
receptors  and  signaling  molecules  were  likely  S.  mallei  targets,  corroborating  previous  observations  that  pathogens  tend  to  interfere  with  host  processes 
related  to  cell  communication  and  actin  cytoskeleton  organization.  The  pathway  diagram  for  the  focal  adhesion  pathway  was  adapted  from  the  KEGG 
pathway  map  [32]  with  permission  from  the  KEGG  database  administrators. 

doi:10.1371/journal.pcbi.1004088.g003 
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contained  proteins  previously  identified  as  bacterial  targets  vital  for  host  actin  cytoskeleton  re¬ 
arrangement,  e.g.,  membrane-associated  small  GTPases  (CDC42  and  RALA),  Filamin-A 
(FILA),  and  Rho  GDP-dissociation  inhibitor  (ARHGDIB)  [2,  33].  The  remaining  cytoskeleton- 
related  host  proteins,  represented  as  dark  red  circles,  participated  in  smaller  cytoskeleton  orga¬ 
nization  interaction  modules  that  were,  on  average,  less  than  two  proteins  (three  interactions 
or  edges)  away  from  the  largest  module. 

Given  that  connector  proteins  (represented  as  white  and  yellow  circles  in  Fig.  4)  between 
the  cytoskeleton  organization  interaction  modules  were  annotated  with  biological  processes 
different  from  cytoskeleton  organization,  as  well  as  the  multifunctional  nature  of  some  cyto- 
skeletal  reorganization  proteins,  we  examined  the  occurrence  of  shared  proteins  that  interacted 
with  multiple  pathways,  i.e.,  pathway  crosstalk.  Initially,  out  of  all  human  proteins  interacting 
with  the  examined  B.  mallei  proteins,  we  evaluated  the  relationships  among  proteins  involved 
in  the  focal  adhesion  pathway  that  participated  in  or  interacted  with  human  proteins  associated 
with  cytoskeleton  organization.  The  shaded  area  in  Fig.  4  shows  that  six  of  12  proteins  were 
present  in  both  systems.  Fig.  5  shows  an  extension  of  this  analysis  that  includes  B.  mallei  inter¬ 
acting  host  proteins  that  are  components  of  eight  other  molecular  pathways  that  shared  pro¬ 
teins  with  the  focal  adhesion  pathway.  Fig.  5A  (left)  shows  that  among  these  nine  pathways, 
the  number  of  pathways  that  shared  one  or  more  proteins  was  low.  However,  the  number  of 
PPIs  connecting  the  proteins  from  one  pathway  with  proteins  from  another  pathway  was 
markedly  higher  [Fig.  5A  (right)].  The  large  number  of  signaling  pathways  affected  via  the 
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Fig  4.  Actin  cytoskeieton  organization  as  a  virulence  factor  target.  Human  proteins  targeted  by  6.  mallei 
proteins  formed  an  interaction  module  that  was  primarily  linked  to  cytoskeleton  organization  and  focal 
adhesion.  Twenty-five  of  these  proteins  were  involved  in  cytoskeleton  organization  processes;  1 3  of  them 
(red  stars)  interacted  with  each  other  (forming  an  interaction  module),  and  the  remaining  12  proteins  (dark  red 
circles)  were  on  average  <  2  nodes  (<  3  edges)  away  from  the  interaction  module.  The  figure  also  shows  the 
overlap  between  the  cytoskeleton  organization  interaction  module  and  the  focal  adhesion  pathway  (shaded 
area),  where  connecting  protein  interactions  from  focal  adhesion  pathway  proteins  or  other  proteins  appear 
as  smaller  circles  and  dashed  lines.  Note  that  all  human  proteins  shown  interacted  with  one  or  more  B. 
mallei  proteins. 

doi:  1 0. 1 371  /journal  .pcbi.  1 004088. g004 
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Fig  5.  Crosstalk  between  host  pathways  targeted  by  B.  mallei  virulence  factors.  A)  The  number  of  shared  proteins  and  shared  pathway  protein-protein 
interactions  (PPIs)  among  human  proteins  interacting  with  B.  mallei  Vnat  appeared  in  the  focal  adhesion  pathway  and  in  up  to  eight  other  molecular  pathways 
that  shared  proteins  (partially  overlapped)  with  this  pathway.  The  number  of  shared  proteins  across  pathways  was  smaller  than  the  number  of  shared 
pathway  PPIs.  B)  The  location  and  number  of  crosstalk  interactions  affected  by  B.  mallei  centered  around  the  focal  adhesion  pathway  and  appear  as  arrows 
with  line  thicknesses  proportional  to  the  number  of  shared  PPIs.  The  identity  and  numberof  virulence  factors  that  target  each  pathway  are  illustrated  using  a 
word  cloud.  By  preferentially  targeting  signaling  pathways,  the  effect  of  one  protein  modulated  through  interaction  with  a  virulence  factor  could  propagate  and 
disproportionally  influence  a  larger  number  of  biological  processes. 

doi:10.1371/journal.pcbi.1004088.g005 


focal  adhesion  pathway  magnified  the  effects  of  these  cross-pathway  interactions.  Fig.  5B  illus¬ 
trates  the  propagation  and  number  of  cross-pathway  interactions  that  were  mediated  via  the 
focal  adhesion  pathway  and  shows  the  known  virulence  factors  (Table  1)  that  can  be  associated 
with  each  pathway.  Thus,  virulence  factors  affected  biological  processes  and  molecular  path¬ 
ways  associated  with  multiple  interconnecting  host  processes,  providing  an  explanation  of  how 
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interference  with  the  function  of  a  single  protein  propagated  to  and  influenced  multiple  host 
processes  and  pathways. 


Using  multiple  host-pathogen  interaction  networks  to  predict  the  role  of 
pathogen  proteins 

Our  statistical  analyses  show  that  the  aggregated  host-pathogen  interaction  data  could  identify 
host  molecular  mechanisms  targeted  by  B.  mallei.  However,  detecting  specific  mechanisms  of 
action  for  each  pathogen  protein  based  on  enrichment  analysis  of  large-scale  Y2H  protein  in¬ 
teraction  data  is  not  trivial.  This  partly  stems  from  experimental,  biological,  and  statistical  con¬ 
siderations.  For  example,  the  Y2H  methodology  is  biased  for  certain  types  of  interactions  in  a 
non-native  environment  [34],  binding  events  may  or  may  not  be  biologically  relevant,  and  sta¬ 
tistical  testing  is  hampered  by  small  effect  sizes  and  small  statistical  power.  Conversely,  this 
and  previous  studies  have  shown  that  multiple  pathogens  tend  to  target  the  same  host  proteins, 
biological  processes,  and  pathways  [2,  9, 10, 18].  Hence,  one  could  potentially  use  common 
pathogenic  mechanisms  to  more  robustly  characterize  bacterial  proteins  and  their  host  targets. 

We  explored  a  focused  set  of  human-pathogen  interactions  derived  from  putative  virulence 
factor  proteins  from  S.  enterica  and  Y.  pestis;  the  majority  of  these  proteins  are  associated  with 
a  Type  3  Secretion  System  (T3SS).  These  datasets  contained  62  host-pathogen  interactions  for 
21  S.  enterica  proteins  [12]  and  223  interactions  for  69  Y.  pestis  proteins  [13].  An  initial  orthol- 
ogy-based  approach  to  retrieve  annotations  proved  too  restrictive  and  did  not  generate  any 
novel  insights  into  B.  mallei  virulence.  Instead,  we  used  an  alternative  network  alignment- 
based  methodology  optimized  for  inter-species  alignment,  i.e.,  we  differentiated  between  host 
and  pathogen  proteins  and  avoided  mapping  host  proteins  to  pathogen  proteins  and  vice 
versa.  As  detailed  in  Materials  and  Methods,  we  introduced  a  novel  alignment  algorithm 
(HPIA)  designed  specifically  for  the  alignment  of  cross-species  interactions.  We  used  the 
HPIA  algorithm  to  identify  similarities  between  host-pathogen  interactions  using  the  B.  mallei, 
S.  enterica,  and  Y.  pestis  PPI  datasets,  based  on  a  combined  similarity  measure  that  included  to¬ 
pological  similarity,  sequence  similarity,  and  functional  similarity.  Table  5  lists  the  B.  mallei 
proteins,  their  aligned  protein  partners,  and  inferred  function(s)  derived  from  the  alignment. 

Table  5  shows  B.  mallei  proteins  with  a  known/assumed  function  in  pathogenicity  and  the 
corresponding  functionality,  predicted  based  on  the  function  of  their  aligned  Y.  pestis  and 
S.  enterica  partners.  We  identified  similarities  between  T3SS  proteins  involved  in  bacterial  in¬ 
ternalization  for  all  three  pathogens,  including  the  orthologs  BipB-SipB  and  BipC-SipC.  Addi¬ 
tionally,  the  B.  mallei  PilA  protein  was  aligned  to  Y.  pestis  fimbrial  protein  FimA6,  another  cell 
adhesion  protein.  Furthermore,  although  the  S.  enterica  and  Y.  pestis  interaction  datasets  in¬ 
cluded  mainly  T3SS  proteins,  the  two  B.  mallei  Vgr  proteins  associated  with  bacterial  survival 
and  replication  via  Type  6  Secretion  Systems  (T6SS)  were  aligned  to  S.  enterica  and  Y.  pestis 
proteins  also  known  to  promote  bacterial  survival  and  replication.  Thus,  while  the  aligned  pro¬ 
teins  may  have  different  roles  within  each  pathogen,  it  is  possible  that  they  interact  with  a  simi¬ 
lar  type  of  host  proteins,  causing  the  alignment  algorithm  to  capture  these  similarities.  Overall, 
the  alignment-based  inferred  roles  for  six  out  of  the  1 1  (55%)  annotated  B.  mallei  proteins 
matched  their  existing  annotation  and  their  corresponding  secretion  system  assignment 
(Table  5,  fourth  and  second  columns,  respectively).  If  we  only  consider  matching  functionality 
and  ignore  the  Vgr  association  to  T6SS,  the  inferred  functions  matched  the  existing  annotation 
in  eight  of  1 1  (73%)  cases. 

For  the  B.  mallei  virulence  factors  without  known  functions  in  pathogenicity,  listed  in  the 
lower  part  of  Table  5,  the  functional  mappings  from  S.  enterica  and  Y.  pestis  provided  an 
indication  of  their  mechanistic  role  in  virulence.  Of  special  importance  were  the  three 
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Table  5.  Putative  role  of  B.  mallei  proteins  inferred  from  host-pathogen  interaction  network  aiignment  resuits. 


B.  mallei  proteins 

Proteins  aligned  to  a 
given  B.  mallei 
protein 

Putative  (aiignment-predicted)  role  of 
a  given  B.  maiiei  protein 

Locus  tag 
(name) 

Secretion 

system 

association 

Description 

Known  role 

S.  Y.  pestis 

enterica 

BMA0278 

(PilA)’f 

Type  2 

Type  IV  pilin 

Cell  adhesion 

AvrA 

FimA6 

-  Cell  adhesion 

-  Promotion  of  bacterial  survival 

BMAA0445 
(VgrG)*,  * 

Type  6 

Rhs  element  Vgr 
protein 

Promoting  bacterial 
survival  and  replication 

SifA 

YPC2940 

-  Cell  adhesion 

-  Promotion  of  bacterial  survival 

BMAA0446 
(VgrG)*,  * 

Type  6 

Rhs  element  Vgr 
protein 

Promoting  bacterial 
survival  and  replication 

SseG 

PilF 

-  Replication  niche  establishment 

BMAA0749 

(BimA)* 

Type  5 

Hemagglutinin 
domain  protein 

Actin  based  motility 

SopB 

CmpA 

-  Promotion  of  bacterial  survival 

-  Bacterial  internalization 

BMAA1269 

(VgrG) 

Type  6 

Rhs  element  Vgr 
protein 

Promoting  bacterial 
survival  and  replication 

SseJ 

LcrD 

-  Regulation  of  T3SS  secretion 

-  Promotion  of  bacterial  survival 

BMAA1521 
(BopA)+,  * 

Type  3 

Effector  protein 

Bacterial  internalization 
and  promoting  bacterial 
survival 

SptP 

YscK 

-  Bacterial  internalization 

BMAA1528 
(BipD)''',  * 

Type  3 

Translocator  protein 

Bacterial  internalization 

SipC 

YopT 

-  Bacterial  internalization 

-  Interference  with  host  cytoskeleton 

BMAA1530 
(BipC)+  * 

Type  3 

Effector  protein 

Bacterial  internalization 

SipC 

LcrV 

-  Regulation  of  T3SS  activation 

-  Bacterial  internalization 

-  Interference  with  host  cytoskeleton 

BMAA1531 
(BipBjt,  * 

Type  3 

Translocator  protein 

Bacterial  internalization 

SipB 

YscS 

-  Bacterial  internalization 

BMAA1538 
(BsaU)+,  * 

Type  3 

Type  3  secretion 
protein  (needle 
assembly) 

Bacterial  internalization 

SopE 

YscL 

-  Bacterial  internalization 

-  Interference  with  host  cytoskeleton 

BMA0429 

(Cmk) 

Type  3 

Cytidylate  kinase 

Kinase  activity;  ATP 
binding 

SpiC 

YscN 

-  Regulation  of  T3SS  secretion 

BMAA1525 

(BapB)* 

Type  3 

Type  3  secretion 
protein 

N/A 

SspH2 

TyeA 

-  Regulation  of  T3SS  secretion 

-  Interference  with  host  ubiquitination 

BMAA1865 

Type  3 

Hypothetical  protein 

N/A 

SopE2 

YopE 

-  Bacterial  internalization 

-  Interference  with  host  cytoskeleton 

BMAA0728 

(TssN) 

Type  6 

Hypothetical  protein 

N/A 

SseL 

YpkA 

-  Interference  with  host  signaling 

-  Interference  with  host  ubiquitination 

BMA0267 

- 

Pseudogene 

N/A 

SipA 

YPC4044 

-  Bacterial  internalization 

BMAA0553 

Type  2 

Ser/Thr  protein 
phosphatase 

N/A 

Ssel 

YPC2113 

-  Promotion  of  bacterial  survival 

BMAA0679 

- 

Chemotaxis  protein 
CheC 

N/A 

SspHI 

YscY 

-  Bacterial  internalization 

-  Interference  with  host  ubiquitination 

BMA2469 

(Tkt) 

Type  3 

Transketolase 

N/A 

SipA 

YscX 

-  Bacterial  internalization 

-  Interference  with  host  cytoskeleton 

BMA3281 

(FliF) 

Type  3 

Flagellar  M-ring 
protein 

N/A 

PipB2 

YPMT1 .42ac 

-  Promotion  of  bacterial  survival 

BMAA0238 

Type  2 

Hypothetical  protein 

N/A 

SIrP 

YopN 

-  Regulation  of  T3SS  secretion 

-  Interference  with  host  ubiquitination 

BMAA1619 

Type  3 

Hypothetical  protein 

N/A 

SpvB 

HofG7 

-  Promotion  of  bacterial  survival 

For  detailed  information  about  the  inferred  roles,  see  S5  Table. 

''^Proteins  that  matched  their  existing  annotation  and  secretion  system  (if  known). 
^Proteins  that  matched  their  existing  annotation  but  not  the  secretion  system. 
*Proteins  that  have  been  experimentally  linked  to  a  particular  secretion  system. 

doi:10.1371/journal.pcbi.1004088.t005 
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novel  virulence  factors  we  identified  from  the  Y2H  data  and  experimentally  verified  in  a  B. 
mallei  ATCC  23344  aerosol  mouse  infection  model:  BMAA1865,  BMAA0728,  and 
BMAA0553.  The  HPIA  algorithm  identified  similarities  between  BMAA1865,  S.  enterica 
protein  SopE2,  and  Y.  pestis  protein  YopE,  based  on  their  interactions  with  host  proteins  in¬ 
volved  in  actin-cytoskeleton  rearrangement  processes.  The  alignment  also  identified  similar¬ 
ities  between  B.  mallei  protein  BMAA0728  and  S.  enterica  protein  SseL  based  on  their 
interactions  with  host  proteins  involved  in  ubiquitination.  Furthermore,  the  HPIA  algorithm 
identified  similarities  between  B.  mallei  protein  BMAA0553  and  S.  enterica  protein  Ssel  as¬ 
sociated  with  regulation  of  the  host  cytoskeleton  and  inhibition  of  cell  motility.  These  results 
imply  that  the  hypothetical  protein  BMAA1865  has  a  role  in  the  host  actin-cytoskeleton  ma¬ 
nipulation,  that  BMAA0728  has  a  role  in  (de)ubiquitination,  and  that  serine/threonine  phos¬ 
phatase  BMAA0553  has  a  role  in  cytoskeleton  regulation.  These  are  the  same  roles  we 
previously  proposed  for  these  three  proteins  based  on  the  literature  review  of  pathogenic 
mechanisms  [8]. 

The  alignment  also  identified  a  putative  role  of  another  protein  of  interest  from  our  previous 
study,  cytidylate  kinase  BMA0429.  The  host-pathogen  PPI  data  linked  this  protein  to  multiple 
processes  related  to  pathogenicity.  We  were  not  able  to  test  its  pathogenicity  in  an  animal 
model,  because  this  protein  appeared  to  be  essential.  However,  the  alignment  results  imply  that 
this  protein  had  a  role  in  the  regulation  of  T3SS  secretion,  as  it  is  mapped  to  two  T3SS  regula¬ 
tors  that  are  more  likely  to  be  localized  in  the  bacterial  cytoplasm  than  to  be  translocated  into 
the  infected  ceU:  SpiC  in  S.  enterica  and  YscN  in  Y.  pestis  [35-38]. 


Multiple  B.  mallei  virulence  factors  target  eukaryotic-specific  host-cell 
processes 

A  key  insight  into  the  virulence  mechanisms  that  we  could  derive  from  the  Y2H  interactions 
was  that  B.  mallei  targeted  eukaryotic-specific  cellular  mechanisms,  such  as  ubiquitination  and 
focal  adhesion.  Thus,  the  specific  virulence  adaptations  retained  in  the  evolution  of  B.  mallei  as 
an  obligate  mammalian  pathogen  include  targeting  the  ubiquitination  degradation/signaling 
system  and  using  the  focal  adhesion  pathway  as  a  fulcrum  for  transmitting  mechanical  forces 
and  regulatory  signals.  This  provides  the  mechanisms  to  modulate  and  adapt  the  host-cell  envi¬ 
ronment  for  the  successful  establishment  of  host  infections. 

Based  on  our  analysis  of  their  host  protein  interactions  and  targeted  pathways,  the  nine 
known  virulence  factors  shared  many  common  points  of  attack  on  the  host  cell’s  physiology. 
Expanding  on  the  cross-talk  analysis  shown  in  Fig.  5B,  we  created  an  interconnected  host¬ 
pathway  map.  We  inferred  the  connection  between  two  pathways  from  the  number  of  inter¬ 
pathway  PPIs  where  both  host  proteins  are  interacting  with  at  least  one  B.  mallei  protein.  The 
larger  the  number  of  such  host  protein  pairs,  the  larger  the  potential  influence  B.  mallei  have 
on  the  cross-talk  between  the  involved  pathways.  Fig.  6  shows  the  extent  of  this  influence  be¬ 
tween  host  pathways  and  the  central  role  the  focal  adhesion  pathway  plays  in  propagating  cell 
signaling  and  affecting  key  host  cellular  processes  relevant  to  B.  mallei  pathogenesis.  The  path¬ 
ways  implicated  in  Fig.  5B  directly  related  to  focal  adhesion  are  marked  with  stars  in  Fig.  6. 
These,  in  turn,  are  interconnected  with  a  large  number  of  signaling  pathways  (Fig.  6 — grey 
background)  that  ultimately  control  cell  cycle,  morphology,  and  growth.  A  number  of  known 
disease  processes,  marked  in  red  symbols,  are  also  interconnected  or  directly  connected  to  this 
signaling  network.  We  hypothesized  that  this  virulence  factor  host-pathogen  network  is  central 
in  controlling  key  cellular  mechanisms  that  allow  B.  mallei  to  adapt  the  host  ceU  environment 
and  ensure  robust  infection. 
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Fig  6.  Focal  adhesion  as  a  centrai  hub  for  targeting  host  celis.  The  number  of  shared  protein-protein  interactions  (PPIs)  targeted  by  B.  mallei  virulence 
factors  is  shown  as  lines  proportional  to  the  number  of  PPIs  (only  connections  with  1 0  or  more  interactions  are  illustrated).  Physiological,  cancer,  and  disease 
pathways  were  all  interconnected  via  signaling  pathways  that  could  be  affected  through  the  focal  adhesion  pathway. 

doi:10.1371/journal.pcbi.1004088.g006 


Summary 

Given  the  association  of  the  selected  pathogen  proteins  to  secretion  systems,  the  underlying 
Y2H  methodology,  and  our  analysis  methodology,  our  detection  capabilities  were  geared  to 
finding  host  pathways  and  biological  processes  targeted  by  B.  mallei  via  virulence  factors.  The 
limitation  of  this  approach  is  that  while  a  host-pathogen  protein  interaction  may  occur,  as  de¬ 
termined  via  Y2H  experimentation,  this  type  of  data  does  not  allow  us  to  resolve  when,  where, 
or  why  such  interactions  are  important.  Furthermore,  even  though  the  strict  statistical  thresh¬ 
old  at  a  false  discovery  rate  (FDR)  <  5%  minimizes  the  chances  of  identifying  random  data  cor¬ 
relations,  it  does  not  test  our  hypothesis  that  the  pathway  is  involved  in  B.  mallei  virulence. 
Conversely,  the  strength  of  our  analysis  is  that  the  identified  host  interactions  are  dominated 
by  known  and  validated  virulence  factors,  allowing  us  to  create  new  hypotheses  around  the  bio¬ 
logical  interpretation  of  pathway  interaction  patterns. 

Our  results  showed  that  host-pathogen  PPIs  represent  a  rich  source  of  information  about 
molecular  mechanisms  of  pathogenicity,  and  that  these  interactions  can  be  used  to  identify  and 
characterize  host  molecular  pathways  and  processes  targeted  by  pathogens.  Specifically,  our 
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topological  analysis  of  human-B.  mallei  protein  interactions  showed  that  known  and  putative 
B.  mallei  virulence  factors  tend  to  target  multifunctional  host  proteins,  host  proteins  that  inter¬ 
act  with  each  other,  and  host  proteins  with  a  large  number  of  interacting  partners.  Additional¬ 
ly,  the  analysis  identified  a  number  of  host  processes  and  pathways  relevant  to  B.  mallei 
pathogenicity,  many  of  which  have  been  linked  to  bacterial  pathogenicity  in  previous  experi¬ 
mental  studies,  e.g.,  signaling  and  communication,  protein  modification  and  regulation,  cyto- 
skeleton  organization,  and  focal  adhesion.  Furthermore,  the  topological  analysis  suggested  that 
B.  mallei  virulence  factors  target  host  molecular  processes  through  interference  with  their  di¬ 
rect  and  indirect  host-interacting  partners,  implying  that  the  process  of  pathogenic  internaliza¬ 
tion  and  intracellular  survival  requires  the  modulation  of  multiple  host  cellular  processes. 

We  further  introduced  the  novel  HPIA  algorithm  that  can  be  used  to  identify  common  sets 
of  host-pathogen  interactions  by  aligning  (mapping)  host-to-host  and  pathogen-to-pathogen 
proteins  from  two  interaction  datasets.  We  used  the  HPIA  algorithm  to  compare  human- 
B.  mallei  interactions  to  those  of  human- Y.  pestis  and  human-S.  enterica  and  identified  a  statis¬ 
tically  significant  number  of  aligned  interactions.  We  also  showed  that  the  resulting  alignments 
could  be  used  to  predict  roles  of  B.  mallei  proteins  based  on  the  roles  of  their  aligned  Y.  pestis 
and  S.  enterica  partners. 

Finally,  given  that  nine  of  21  proteins  in  our  dataset  are  known  virulence  factors,  we  could 
hypothesize  on  why  and  how  B.  mallei  uses  these  proteins  to  overcome  multiple  defense  sys¬ 
tems  and  orchestrate  a  robust  infection  process  in  mammalian  hosts.  Ultimately,  the  bacterial 
host-virulence  program  is  derived  from  a  survival  strategy  developed  in  the  rhizosphere,  i.e.,  in 
a  generally  competitive  environment  containing  multiple,  diverse  species.  Using  multiple  viru¬ 
lence  factors  to  target  eukaryotic-specific  mechanisms  common  to  eukaryotic  rhizosphere  spe¬ 
cies,  B.  mallei  broadly  influences  key  processes  in  ubiquitination  and  cell  signaling  to  modulate 
and  adapt  the  host-cell  environment  for  its  benefit. 


Materials  and  Methods 

Human-S.  mallei  protein  interaction  set 

To  create  a  comprehensive  set  of  human-B.  mallei  PPIs,  we  merged  human-B.  mallei  and  mu- 
rine-B.  mallei  PPl  datasets  identified  in  [8].  These  datasets  contained  586  interactions  between 
409  human  and  21  B.  mallei  proteins,  and  797  interactions  between  574  murine  and  25  B.  mallei 
proteins;  19  B.  mallei  proteins  appeared  in  both  sets,  including  nine  known  B.  mallei  virulence 
factors  (Table  1).  When  creating  the  merged  set,  we  considered  only  a  subset  of  murine-B.  mal¬ 
lei  PPIs  in  which  the  B.  mallei  proteins  also  interacted  with  human  proteins  and,  thus,  had 
shown  the  ability  to  bind  to  human  proteins.  The  merging  procedure  consisted  of  four  steps.  In 
the  first  step,  we  identified  B.  mallei  proteins  that  interacted  with  both  hosts  (19  B.  mallei  pro¬ 
teins).  Then,  we  found  human  orthologs  for  each  of  the  419  (73%)  murine  proteins  that  inter¬ 
acted  with  the  B.  mallei  proteins  identified  in  step  1.  In  the  third  step,  we  assessed  whether  the 
human  orthologs  constituted  unique  proteins,  i.e.,  whether  they  were  not  a  part  of  the  experi¬ 
mentally  detected  human-B.  mallei  interactions.  If  not,  we  added  this  interaction  into  the  ortho- 
logous  human-B.  mallei  dataset.  The  resulting  orthologous  dataset  consisted  of  649  interactions 
between  419  human  proteins  and  19  B.  mallei  proteins,  corresponding  to  82%  of  the  murine- 
B.  mallei  PPIs.  Finally,  we  merged  the  experimental  human-B.  mallei  dataset  with  the  ortholo¬ 
gous  human-B.  mallei  dataset  to  create  a  merged  set  of  human-B.  mallei  PPIs.  The  resulting 
merged  dataset  consisted  of  1,235  unique  interactions  between  21  B.  mallei  and  828  human  pro¬ 
teins  (SI  Data  and  S6  Table).  Approximately  72%  (890)  of  these  represent  interactions  among 
the  nine  B.  mallei  known  virulence  factors  (Table  1)  and  663  unique  human  proteins.  All 
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proteins  were  annotated  by  their  official  gene  symbols  as  defined  in  the  HUGO  Gene  Nomen¬ 
clature  Committee  database  [39]. 

We  used  the  National  Center  for  Biotechnology  Information  HomoloGene  database  of  ho¬ 
mologs  (http://www.ncbi.nlm.nih.gov/homologene)  to  identify  human-murine  orthologs  [40]. 


Topological  properties  of  human  proteins  interacting  with  B.  mallei  in  the 
human  PPI  network 

We  calculated  the  following  topological  properties  for  a  set  of  human  proteins  interacting  with 
B.  mallei-.  1)  the  number  of  human  proteins  interacting  with  B.  mallei  proteins  (A/p);  2)  the  aver¬ 
age  number  of  their  interacting  partners  in  the  human  PPI  network  (D);  3)  the  clustering  coef¬ 
ficient,  i.e.,  the  number  of  interactions  among  the  nearest  neighbors  (C);  the  average  shortest 
path  between  any  two  proteins  in  the  set  (SP);  the  average  number  of  interacting  partners  in 
the  human  PPI  network  where  both  partners  interact  with  B.  mallei  proteins  (D,);  and  the 
number  of  host  proteins  in  the  largest  connected  component  (A/^^^).  All  calculations  were  per¬ 
formed  in  R  using  the  igraph  package  [41].  We  evaluated  whether  the  observed  values  for  each 
of  the  five  properties  were  statistically  significant  as  follows.  From  the  human  interactome,  we 
randomly  selected  the  same  number  of  proteins  as  the  number  of  proteins  interacting  with 
B.  mallei  virulence  factors.  Next,  we  calculated  each  of  the  five  topological  properties  for  this 
random  set  of  proteins,  repeating  the  procedure  10^  times.  This  procedure  yielded  10^  values 
for  each  property,  which  followed  a  Normal  distribution  (Normality  was  evaluated  using  the 
quantile -quantile  plots  and  the  Kolmogorov-Smirnov  test  [42],  where  we  found  that  there  was 
not  enough  evidence  in  the  data  to  suggest  that  the  distributions  were  not  Normal).  Then,  for 
each  property,  we  evaluated  a  relationship  between  the  observed  value  for  proteins  interacting 
with  B.  mallei  and  the  values  obtained  for  random  protein  sets  using  a  Z-score.  Finally,  we 
computed  thep-values  corresponding  to  the  resulting  Z-scores. 


Gene  set  functional  enrichment  analyses 

We  performed  GO  and  KEGG  enrichment  analyses  in  R  using  the  Bioconductor  packages  Bio- 
Mart  and  KEGGgraph,  respectively  [43,  44].  As  the  universe  of  human  proteins,  we  used  all 
constituent  proteins  from  the  human  PPI  network.  As  GO  terms  are  specified  at  multiple  levels 
of  detail,  we  used  a  complete  GO  tree  annotation,  excluding  the  root  and  the  top  two  levels  of 
GO  terms.  GO  annotation  was  obtained  from  BioMart  [43] .  For  the  KEGG  enrichment  analy¬ 
sis,  as  the  universe  of  human  proteins,  we  used  the  human  proteins  available  in  KEGGgraph 
that  participated  in  at  least  one  KEGG  pathway  [44].  All  obtained  p-values  were  assessed  using 
the  Benjamini  and  Hochberg  multiple  test  correction  [45].  We  retained  only  annotations  that 
were  enriched  at  an  FDR  control  level  of  0.05,  i.e.,  there  is  a  less  than  5%  chance  that  the  ob¬ 
tained  p-values  are  not  statistically  significant. 

We  performed  two  types  of  enrichment  analysis:  standard  enrichment  analysis  and  network- 
based  enrichment  analysis.  In  the  standard  enrichment  analysis,  we  computed  the  probability  of 
observing  the  number  of  proteins  annotated  with  a  given  term  using  the  hypergeometric  distri¬ 
bution.  In  the  network-based  enrichment  analysis,  we  first  identified  the  largest  connected  com¬ 
ponent  in  the  human  interactome  that  consisted  of  human  proteins  interacting  with  at  least  one 
B.  mallei  virulence  factor  (denoted  as  LCC).  Then,  we  counted  the  number  of  proteins  Mf  in  the 
LCC  that  were  annotated  with  a  GO  biological  process  term  t.  Additionally,  for  each  term  t,  we 
identified  connected  (sub)networks  of  LCC  in  which  all  proteins  were  annotated  with  t;  these  we 
termed  interaction  modules.  We  denoted  such  interaction  modules  as  IMt  and  the  number  of 
proteins  in  these  modules  as  Wf. 
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We  evaluated  whether  the  observed  IMt  interaction  modules  were  statistically  significant  as 
follows.  First,  we  evaluated  whether  the  observed  interacting  proteins  and  their  corresponding 
biological  processes  in  the  LCC  were  statistically  significant  compared  to  an  equal  number  of 
random  proteins.  For  each  GO  term  t,  we  counted  the  number  of  proteins  in  the  random  set 
that  were  annotated  with  t  (denoted  as  rd.  Additionally,  for  each  t,  we  identified  interaction 
modules  in  which  aU  proteins  from  the  random  set  were  annotated  with  t.  We  denoted  the  num¬ 
ber  of  proteins  in  such  interaction  modules  as  Sf.  We  repeated  this  procedure  lO"'  times.  Given 
the  obtained  values,  we  defined  the  probability  pt  of  observing  proteins  annotated  with  t  as 


Pt  = 


1 

1(? 


E 


k-,  where  A:,.  = 


1  if  (h),.  >  n, 
0  otherwise 


(1) 


and  the  probabihtyprp  of  observing  an  interaction  module  IMt,  given  a  random  set  of  host  pro¬ 
teins,  as 


Prp 


1 

1(? 


k.,  where  k^  = 


1  if  (s,),  >  m, 

0  otherwise 


(2) 


Second,  we  evaluated  whether  the  observed  interacting  proteins  and  their  corresponding  bi¬ 
ological  processes  were  statistically  significant  compared  to  random  interactions.  We  randomly 
rewired  the  human  protein  interaction  network,  while  preserving  the  same  degree  distribution 
as  observed  in  the  original  network.  Next,  we  mapped  human  proteins  from  the  LCC  on  the  re¬ 
wired  network  and,  for  each  term  t,  identified  interaction  modules  in  which  aU  proteins  were 
annotated  with  t  We  denoted  the  number  of  proteins  in  such  modules  as  Wf  We  repeated  this 
procedure  lO"'  times.  Finally,  we  calculated  the  probability  of  observing  an  interaction  mod¬ 
ule,  IMt,  given  a  random  set  of  host  interactions,  as 


P 


rw 


1 

n? 


E 


k.,  where  k^  = 


1  if  (tv,),  >  m, 
0  otherwise 


(3) 


IMt  interaction  modules  with  p,  <  0.01,  p^p  <  0.01,  andp^w  <  0.01  contain  a  statistically  signifi¬ 
cant  number  of  human  proteins  and  interactions  among  them,  and  are  statistically  significantly 
enriched  in  a  biological  process  t. 


Human-S.  enterica  and  human-y.  pestis  protein  interactions  set 

The  human-S.  enterica  subsp.  enterica  serovar  Typhimurium  dataset  consisted  of  62  interactions 
between  21  S.  enterica  virulence-associated  proteins  and  51  human  proteins  identified  in  several 
small-scale  experiments  [12].  The  majority  of  S.  enterica  proteins  from  this  set  were  associated 
with  the  bacterial  T3SS.  The  human- 7.  pestis  PPI  dataset  consisted  of  a  union  of  204  interactions 
identified  by  Y2H  screens  and  23  interactions  identified  in  several  small-scale  experiments  [13]. 
The  combined  human- 7.  pestis  PPI  dataset  contained  223  unique  interactions  between  69  7.  pes¬ 
tis  virulence-associated  proteins  and  125  human  proteins.  The  majority  of  7.  pestis  proteins 
were  also  associated  with  the  bacterial  T3SS.  For  the  basic  comparison  of  the  human-5,  mallei, 
human-5,  enterica,  and  human- 7.  pestis  PPI  networks’  characteristics,  see  S7  Table. 
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The  HPIA  algorithm 

Network  alignment  algorithms  [46-49]  have  been  used  previously  to  successfully  identify  both 
conserved  PPIs  [50-55]  and  phylogenetic  relationships  between  species  [51-53].  Although  the 
existing  network  alignment  algorithms  can  be  applied  to  host-pathogen  PPIs,  these  algorithms 
are  not  optimized  for  inter-species  alignment,  i.e.,  they  cannot  differentiate  between  different 
types  of  proteins,  such  as  host  and  pathogen  proteins,  and  consequently  may  map  host  proteins 
to  pathogen  proteins  and  vice  versa.  The  primary  motivation  for  designing  a  network  align¬ 
ment  algorithm  was  to  be  able  to  use  previous  data  and  insights  from  other  host-pathogen  in¬ 
teraction  studies  for  interpreting  our  B.  mallei  host  interaction  data.  If  conserved  network 
motifs  and  interaction  exist,  we  can  use  this  information  to  infer/predict  more  complex  roles 
for  proteins  than  just  transferring  sequence-based  annotation  information  [47,  56].  Thus,  we 
designed  the  HPIA  algorithm  specifically  for  the  alignment  of  host-pathogen  interactions  to 
augment  the  sparsely  annotated  B.  mallei  protein  data. 

We  have  taken  a  number  of  considerations  into  account  in  designing  the  algorithm  based 
on  the  nature  of  the  biological  problem  at  hand.  First,  due  to  the  non -exhaustive  nature  of  Y2H 
experimentation,  the  underlying  interaction  data  are  not  complete  [8] .  Second,  the  selected 
pathogen  species  are  not  identical  to  each  other,  i.e.,  proteins  and  biological  processes  have 
evolved  differently  between  the  species.  Because  of  this  sparse  and  diverse  nature  of  the  patho¬ 
gen  data  at  hand,  we  cannot  a  priori  expect  to  obtain  satisfactory  alignments  based  on  simply 
mapping  human  proteins  to  each  other.  In  this  sense,  a  “perfect”  alignment  is  never  attainable, 
and  instead  we  must  rely  on  approximate  alignments  with  desirable  properties,  such  as  biologi¬ 
cal  consistency  and  sequence  similarity.  Hence,  we  developed  an  alternate  approach  for  align¬ 
ing  bipartite  graphs  for  which  one  set  of  nodes  (pathogen)  are  less-well  characterized  than  the 
other  nodes  (human). 

For  the  limited  number  of  host-pathogen  interactions  we  have  available  for  comparisons, 
our  algorithm  attempts  to  identify  interactions  “conserved”  on  the  functional  level  rather  than 
at  the  exact  protein  level.  Thus,  we  can  exploit  the  fact  that  all  three  host-pathogen  PPI  net¬ 
works  contained  interactions  with  human  proteins  that  participated  in  similar  biological  pro¬ 
cesses.  In  effect,  this  allowed  us  to  extend  the  known  annotations  from  the  other  networks  to 
the  previously  uncharacterized  B.  mallei  virulence  factors. 

Furthermore,  the  algorithm  guarantees  that  host  proteins  will  be  aligned  only  to  host  pro¬ 
teins  and  that  pathogen  proteins  will  be  aligned  only  to  pathogen  proteins. 

Notation.  Let  Gi{Ui,  Vj,  Ej)  and  G2{U2,  V2,  E2)  be  two  bipartite  graphs  (networks),  where 
Ui  and  Vi  are  two  disjoint  sets  of  nodes  in  Gj,  U2  and  V2  are  two  disjoint  sets  of  nodes  in  G^, 
and  El  and  E2  are  sets  of  edges  of  Gj  and  G^  such  that  every  edge  in  Gj  connects  a  node  in  Ui 
to  one  node  in  Vi,  and  every  edge  in  G2  connects  a  node  in  U2  to  one  node  in  (i-e->  no  two 
nodes  within  the  same  set  are  adjacent).  Without  loss  of  generality,  we  can  assume  that  |  Ui\ 
<11/2!  and  I  I  <  |  V2I  (hence,  Gi  <  G2).  The  HPIA  algorithm  is  a  global  network  alignment  al¬ 
gorithm  that  uniquely  matches  each  node  from  Uj  to  exactly  one  node  in  U2,  and  each  node 
from  Vi  to  exactly  one  node  in  ¥2-  Formally,  the  alignment  of  Gj  to  G2  can  be  represented  as  a 
set  of  two  ordered  pairs  [(wj,  U2),  (vj,  V2)},  where  Mj  e  L/j,  U2  e  U2,  Vj  e  Vi,  and  V2  e  V2,  and  no 
two  ordered  pairs  share  a  node. 

For  the  host-pathogen  interaction  networks  (Gj  and  G2),  sets  Ui  and  U2  correspond  to  path¬ 
ogen  proteins,  sets  Vi  and  V2  correspond  to  host  proteins,  and  sets  Ei  and  £2  correspond  to  in¬ 
teractions  between  host  and  pathogen  proteins.  Thus,  our  host-pathogen  interaction  set 
corresponds  to  the  host-pathogen  network,  nodes  correspond  to  proteins,  and  edges  corre¬ 
spond  to  host-pathogen  interactions. 
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Algorithm  description.  The  HPIA  algorithm  is  a  seed-and-extend  algorithm  that  consists 
of  the  following  three  steps:  1)  pre-processing,  2)  identification  of  local  alignment,  and  3)  iden¬ 
tification  of  global  alignment.  In  this  context,  we  referred  to  “local  alignment”  as  a  smaller  area 
of  local  network  similarity  between  two  networks  where  not  all  nodes  need  to  be  included,  and 
“global”  where  all  nodes  from  the  smaller  network  must  be  aligned  to  nodes  from  the  larger 
network  [28].  Both  mappings  are  1-to-l,  i.e.,  one  node  from  one  network  can  be  aligned  to 
only  one  node  of  the  other  network.  In  the  first  step,  the  algorithm  reads  in  host-pathogen  net¬ 
works  and  node  annotations,  and  it  calculates  similarities  between  nodes  based  on  either  the 
provided  annotation  (for  node  sets  with  available  annotation)  or  the  default  topological  simi¬ 
larity  (for  node  sets  without  available  annotation).  This  step  also  includes  handling  user- 
specified  seed  nodes  (nodes  that  should  be  mapped  to  each  other).  The  HPIA  algorithm  allows 
a  user  to  provide  a  set  of  node  pairs  (seeds)  and/or  to  employ  the  algorithm’s  feature  to  auto¬ 
matically  search  for  seed  pairs  (this  search  is  based  on  the  sequence  similarity  or  equivalent 
protein  names).  The  HPIA  algorithm  can  treat  seed  pairs  in  two  ways:  I)  as  a  suggestion  for 
node  alignment,  i.e.,  nodes  that  should  be  aligned  to  each  other  if  other  alignment  constraints 
are  satisfied  (see  below),  or  2)  as  a  requirement  for  the  alignment,  i.e.,  seed  pairs  that  have  to  be 
aligned  to  each  other.  The  first  step  also  includes  initialization  of  the  aligned  pairs  list  as  empty. 

In  the  second  step,  the  HPIA  algorithm  first  identifies  a  pair  of  seed  nodes  {sj,  s^),  where 
Sj  e  Gj  and  $2  e  G2,  based  on  the  node  similarity  measures  from  one  of  the  following  three  sets  (in 
order  of  preference):  i)  a  set  of  aligned  protein  pairs  in  which  both  proteins  are  adjacent  to  at 
least  one  unaligned  protein,  2)  a  set  of  user- suggested  seed  pairs,  and  3)  a  set  of  unaligned  pro¬ 
teins.  All  three  of  these  sets  contain  proteins  from  the  host  and  pathogen  sets  and,  thus,  a  pair  of 
seed  nodes  can  come  from  either  the  host  or  pathogen  sets  of  proteins.  However,  if  there  are  seed 
node  candidates  from  both  sets,  the  HPIA  algorithm  preferentially  selects  a  pathogen  set.  Once 
selected,  the  seed  pair  (sj,  s^)  is  added  to  the  list  of  aligned  pairs.  Next,  the  HPIA  algorithm  ex¬ 
pands  around  the  seed  nodes  by  greedily  aligning  their  direct  neighbors  Sj,-  and  S2j{sji  e  N[sj]  and 
S2j  e  N[sj]),  based  on  the  given  node  similarity  measure  (see  below).  The  HPIA  algorithm  repeats 
step  two  while  there  exists  at  least  one  unaUgned  pair  of  host  proteins  or  pathogen  proteins  adja¬ 
cent  to  at  least  one  other  unaligned  protein.  When  there  are  no  such  pairs  left,  HPIA  proceeds  to 
the  third  step. 

In  the  third  step,  the  HPIA  algorithm  greedily  aligns  all  of  the  remaining  unaligned  patho¬ 
gen  nodes  in  Gj  to  unaligned  pathogen  nodes  G2  and  all  of  the  remaining  unaligned  host  nodes 
in  Gi  to  unaligned  host  nodes  G2,  solely  based  on  node  similarity  measure.  Each  pair  of  nodes 
is  aligned  one  at  a  time  based  on  the  given  node  similarity  (see  below);  network  connectivity  in¬ 
formation  is  not  taken  into  account  explicitly  (only  as  a  part  of  the  node  similarity  measure). 
The  results  of  the  alignment  are  lists  of  aligned  nodes  and  edges  and  the  following  alignment 
statistics:  the  total  number  and  percentage  of  aligned  nodes,  the  number  and  percentage  of 
aligned  pathogen  and  host  nodes,  and  the  number  and  percentage  of  aligned  edges. 

The  HPIA  algorithm  allows  a  user  to  provide  a  set  of  node  pairs  (seeds)  and  to  use  an  auto¬ 
matic  search  for  seed  pairs  (this  search  is  based  on  the  sequence  similarity  or  equivalent  protein 
names)  by  specifying  an  optional  parameter,  “additionalSeeds.”  The  HPIA  algorithm  can  treat 
these  seed  pairs  in  two  ways.  If  the  “relaxSeeds”  option  is  given,  the  nodes  will  be  aligned  to 
each  other  only  if  other  alignment  constraints  are  satisfied  as  detailed  below.  If  the  “relaxSeeds” 
option  is  not  given,  the  seed  pairs  are  forced  to  be  aligned  to  each  other.  We  recommend  that 
the  “relaxSeeds”  parameter  be  turned  on  if  the  “additionalSeeds”  parameter  is  used. 

All  ties  in  the  algorithm  are  broken  randomly.  The  implementation  of  the  HPIA  algorithm 
is  also  presented  as  pseudocode  in  SI  Text.  Fig.  7  shows  a  high-level  description  of  the  HPIA 
alignment  algorithm. 
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Fig  7.  Host-Pathogen  Interactions  Alignment  (HPIA)  algorithm.  The  HPIA  algorithm  is  a  seed-and  extend  algorithm  that  aligns  two  bipartite  graphs,  e.g., 
two  different  host-pathogen  protein  interaction  networks.  A)  Given  an  initial  pair  of  seed  nodes  (red  nodes  U,  and  u,)  from  two  graphs  Gi  (left)  and  G2  (right), 
the  algorithm  first  aligns  seed  nodes  to  each  other.  Then,  it  aligns  the  neighbors  of  the  seed  nodes  from  the  first  graph  (green  nodes  Vi-Ve)  to  the  neighbors 
of  the  seed  nodes  in  the  second  graph  (green  nodes  v^-Vg)  based  on  the  node  similarity  measure  (as  defined  in  Equations  6,  8,  and  9).  This  procedure  results 
in  six  aligned  nodes  and  five  aligned  edges.  B)  The  algorithm  iteratively  selects  new  seeds  and  extends  around  them,  e.g.,  it  selects  nodes  Ve  from  Gi  and  Vg 
from  G2  as  new  seed  nodes  and,  based  on  the  node  similarity  measure,  aligns  their  unaligned  neighbors  U2  to  U2,  creating  an  additional  aligned  edge  (U2-V6 
to  U2-Vg).  C)  When  the  algorithm  cannot  find  any  seed  nodes  of  the  same  type  that  have  unaligned  neighbors,  it  greedily  aligns  all  of  the  remaining  unaligned 
nodes  based  on  their  type  and  the  node  similarity  measure.  Some  nodes  may  remain  unaligned  if  the  graphs’  sizes  vary,  e.g.,  when  there  is  no  match  for  Vg 
from  G2  in  Gi .  The  HPIA  algorithm  generates  a  list  of  aligned  nodes  and  a  list  of  aligned  edges  inferred  from  the  aligned  nodes. 


doi:10.1371/journal.pcbi.1004088.g007 


Node  similarity  measures.  The  HPIA  algorithm  uses  one  or  more  topologically  and  biolog¬ 
ically  based  protein  similarity  measures  to  identify  conserved  interactions.  Similarity  between 
two  proteins  can  always  be  calculated  based  on  at  least  one  of  the  metrics  and,  thus,  the  algo¬ 
rithm  always  has  a  metric  to  match  one  protein  to  another.  If  no  node  annotation  is  provided, 
the  HPIA  algorithm  uses  the  default  topological  similarity  measure,  So7'(mj,  m^),  to  calculate 
the  similarity  between  nodes  rij  e  Gi  and  n2eG2- 

X  min[deg(n,),deg(«,)]  min[»d(«,),  «d(»J] 

max[deg{nj),deg{ng)]  max[Md(«j), 

where  deg{n)  denotes  the  degree  of  node  n  and  nd{n)  denotes  the  neighborhood  density  of  n, 
defined  as 


nd{n)  =  deg(«J 

nj^£N[n\ 


(5) 
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where  N[n]  -  N(n)U{n}  represents  the  closed  neighborhood  of  node  n,  i.e.,  the  node  n  and  the 
set  of  its  adjacent  nodes  (for  a  PPI  network,  this  corresponds  to  a  protein  and  aU  of  its  interact¬ 
ing  partners),  cr  is  a  parameter  in  [0, 1]  that  controls  the  contribution  of  the  degree  of  a  node  to 
its  similarity  function.  We  empirically  selected  a  -  0.7,  as  we  wanted  to  weight  the  number  of 
direct  host-pathogen  interactions  higher  than  the  number  of  host-pathogen-host  or  pathogen- 
host-pathogen  interactions. 

If  node  annotation  is  provided,  the  HPIA  algorithm  defines  a  similarity  between  pathogen 
proteins  e  Uj  and  U2  e  U2  as 

S(Mj,  Ug)  =  Ug)  +  Spg{Uj,  Ug)  +  Sqq{Uj,  Ug)  (6) 

and  a  similarity  between  host  proteins  v,  e  Vi  and  Vg  e  Vg  as 

^s)  ~  ^GDv-pi^n  ^2)  “f  '^gdv-h(^u  “f  ^2)  “f  ^2)^  (^) 

where  Scdv-p  and  Scdv-h  denote  the  graphlet  degree  vector  similarity  [57]  derived  from  the 
host-pathogen  interaction  network  and  the  host-PPI  network,  respectively;  See  represents  the 
BLAST  expected  value  (E-value)  similarity  [58],  defined  as  1-  E-value  for  E-values  <  1  and  0 
otherwise;  and  Sco  denotes  the  GO  term  annotation  similarity  calculated  using  the  Jaccard 
similarity  measure  [59].  If  a  specific  type  of  annotation  is  not  provided,  the  HPIA  algorithm  as¬ 
signs  the  similarity  value  0  to  the  corresponding  similarity  parameters,  e.g.,  if  BLAST  E-values 
are  not  provided,  the  value  of  See  for  all  pairs  of  nodes  is  set  to  0.  We  did  not  add  the  graphlet 
degree  vector  similarity  for  the  pathogen  networks  because  only  a  few  pathogen-PPI  networks 
are  available,  none  of  which  are  for  B.  mallei. 

Data.  For  the  topological  node  annotation,  we  used  graphlet  degree  vectors  [57]  of  aU  host 
and  pathogen  proteins  from  the  host-pathogen  PPI  networks.  Host  proteins  found  in  the  host 
PPI  network  [25]  were  additionally  annotated  with  another  set  of  graphlet  degree  vectors  cal¬ 
culated  based  on  the  host  PPI  network  topology.  We  used  GO  annotation  [23]  downloaded 
from  UNIPROT  [60]  [the  lowest  (leaf)  level]  as  the  biological  node  annotation.  Additionally, 
we  used  BLAST  E-values  of  <  0.01  to  define  similarities  between  proteins  [58].  Protein  se¬ 
quences  were  downloaded  from  UNIPROT  and  aligned  using  BLAST  pairwise  sequence 
alignment. 

Alignment  quality.  To  assess  the  topological  quality  of  the  alignment,  we  used  edge  correct¬ 
ness  (EC),  defined  as  the  percentage  of  edges  in  Gj  that  were  aligned  to  edges  in  Gg  [51].  To  as¬ 
sess  the  biological  quality  of  the  alignment,  we  evaluated  whether  the  number  of  aligned 
protein  pairs  that  share  one  or  more  GO  term(s)  was  statistically  significant  compared  to  the 
number  we  could  expect  at  random  using  the  standard  model  of  sampling  without  replace¬ 
ment,  as  described  in  previous  studies  [51-53].  We  used  the  same  approach  to  assess  the  statis¬ 
tical  significance  of  the  alignment  of  two  bipartite  networks,  Gj(Uj,  Vi,  E,)  and  Gg{Ug,  Yg,  Eg), 
with  the  EC  of  x%  (similar  to  the  implementation  described  above  [51-53]).  We  aligned  each 
pair  of  host-pathogen  interaction  networks  30  times  and  reported  the  average  and  standard  de¬ 
viations  of  the  alignment  scores  over  aU  runs,  as  weU  as  the  best  score  (S8  Table).  We  ascer¬ 
tained  the  robustness  of  the  alignments  with  respect  to  the  E-value  cutoff  and  observed  no 
significant  differences  in  the  results  when  lowering  the  cutoff  value  from  10"^  to  10'^.  To  assess 
the  biological  quality  of  the  alignment,  we  evaluated  whether  the  number  of  aligned  protein 
pairs  that  share  one  or  more  GO  term(s)  was  statistically  significant  compared  to  the  number 
we  could  expect  from  a  random  alignment.  Given  that  aU  obtained  alignments  were  of  similar 
biological  quality,  we  further  refined  our  prediction  by  using  the  alignments  that  had  the 
highest  EC  score,  i.e.,  we  used  the  alignment  with  the  highest  EC  score  to  infer  the  role  of 
B.  mallei  proteins. 
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Implementation  and  availability 

All  statistical  analyses  were  performed  in  R.  All  networks  were  plotted  using  Cytoscape  [61]. 
The  cross-species  network  alignment  algorithm  was  developed  in  C-n-.  Executable  files  and  ex¬ 
amples  for  the  HPIA  algorithm  are  provided  at  http://www.bhsai.org/downloads/hpia/. 
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